library(readxl)
library(kableExtra)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter()     masks stats::filter()
## ✖ dplyr::group_rows() masks kableExtra::group_rows()
## ✖ dplyr::lag()        masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(vistime)
library(treemap)
# require(devtools)
# install_github("lchiffon/wordcloud2")
library(wordcloud2)
library(flextable)
## 
## Attaching package: 'flextable'
## 
## The following object is masked from 'package:purrr':
## 
##     compose
## 
## The following objects are masked from 'package:kableExtra':
## 
##     as_image, footnote
library(devtools)
## Loading required package: usethis
## Warning: package 'usethis' was built under R version 4.3.3
devtools::install_github("https://github.com/Genentech/phase1b")
## Using GitHub PAT from the git credential store.
## Skipping install of 'phase1b' from a github remote, the SHA1 (9fbc3baf) has not changed since last install.
##   Use `force = TRUE` to force installation
library(phase1b)

Gearing our Industry Statisticians up for Software success : A phase1b journey.

Audrey Yeo

  1. Opinions do not reflect employer

  2. This presentation has ALT text and as much as possible, uses colour-blind friendly palettes

  3. Code for this Quarto-rendered .html will also be shared

About phase1b

phase1b package history

  • 2015 : Started as a need in Roche’s early development group, package development led by Daniel Sabanés Bové in 2015.

  • 2023 : Refactoring, Renaming, adding Unit and Integration tests as current State-of-Art Software Engineering practice.

  • 100% written in R and Open Source.

  • Website : genentech.github.io/phase1b/

library(devtools)
devtools::install_github("https://github.com/Genentech/phase1b")
library(phase1b)

Clinical Trials Statisticians understand :

…the combined Scientific and Business development of a therapy.

My Journey at F. Hoffmann La-Roche

Why Software Engineering ?

Why find solutions when we can also build them ?

Why Software Engineering : A response to dynamic decision making in early development

control = 0.6
thetaT_low = 0.6
result <- predprob(
  x = 16, n = 23, Nmax = 40, p = control, thetaT = thetaT_low,
  parE = c(0.6, 0.4)
)

thetaT_high = 0.9
result_high_thetaT <- predprob(
  x = 16, n = 23, Nmax = 40, p = control, thetaT = thetaT_high,
  parE = c(0.6, 0.4)
)
data = rbind(result$table, result_high_thetaT$table)
data$thetaT = c(rep("60%", 18), rep("90%", 18))
df_thetaTlow <- data %>% filter(thetaT == "60%") 
df_thetaThigh <- data %>% filter(thetaT == "90%") 

ggplot(df_thetaTlow) +
  geom_point(aes(x = cumul_counts, y = posterior, colour = success)) +
  scale_x_discrete(limits = c(seq(23, 40, by = 1))) +
  xlab("\nFuture successful reponders") +
  ylab("Probability\n") +
  ggtitle("With P (RR > 0.6 | data ) > 60% : \n32 of 40 responders needed to achieve a Go decision")  +
  theme(text = element_text(size = 20)) +
  scale_color_manual(values = c("#D55E00", "#009E73"))
## Warning in scale_x_discrete(limits = c(seq(23, 40, by = 1))): Continuous limits supplied to discrete scale.
## ℹ Did you mean `limits = factor(...)` or `scale_*_continuous()`?
These two graphs side by side show that by increasing the threshold for an Efficacy or Go decision, the number of responders out of 40 is higher with the higher threshold, ie 35 patients instead of 32 patients of 40

Predictive Posterior CDF for different Efficacy Rules

Skills needed : Covering the basics

  • Onboarding June + July 2023
    • R basics: coding and style
    • Good function writing in R
    • R debugging
    • R advanced: package development
      • Basics of Package Development in R
      • roxygen2 introduction another resource
      • Good Software Engineering Practice for R Packages

in summary, gaps were : Git merging, Pull Requests, Styling, Debugging, Writing Tests

New beginnings :

Knowledge and skills for Statistical Software Engineering!

  • Visualing branches : Git kraken, VS Code Git Graph Extension
  • Checking : pre-commit, Git hub checks, R CMD
  • Documentation : Writing (roxygen2), building, reviewing documentation
  • Testing : Taking a reviewer perspective (testthat and checkmate)
  • Styling : styler, prettier … building your own style

What is refactoring ?

knitr::include_graphics("images/Peranakan_bowl.jpg", error = FALSE)

see sub cap

Tools for refactoring

knitr::include_graphics("images/typed_asserts.png", error = FALSE)
see sub cap

Roxygen skeleton example for one user-facing function in phase1b

Tools for refactoring

  • Variables are typed defined from package roxygen2 and it’s type asserted from package checkmate
  • Unit and integration testing takes on the perspective of a reviewer and to ensure the function calls calculates what it says it calculates (they are my favourite). Package testthat and checkmate are used here
  • Making phase1b a State of Art Software1 - reproducible, robust, testable, intuitive and open to collaboration
library(roxygen2)
library(devtools)
roxygen2::roxygenise() # converts roxygen comments to .Rd files
devtools::document()  # R converts .Rd files to human readable documentation
# then Ctrl + Shift + D, if you’re using RStudio.

What is debugging ?

To find a needle in the haystack systematically

  • Takes the most time and is the most difficult
  • Needs a lot more courage than creativity
  • Maintains humility

embrace small steps as smaller PRs can be sizeABLE.

# | echo = false
# | eval = false
debugonce()
debug()
undebug()
options(error = recover)
options(error = NULL)

Polishin’ the vintage function - A checklist approach

  1. One Design Document for entire function

  2. One issue is the smallest task from (1)

  3. One issue to one branch

  4. One issue to one Pull Request (PR)

  5. Create Test files for helper functions

  6. Create Test files and example files for main function calls

The Design Document

knitr::include_graphics("images/Design_doc_ocPost.png")

User-facing functions start with Design-document

  1. Helps achieve Clarity on the form and purpose of the user-facing function

  2. Test regular and edge cases

  3. Makes the rest of the work “easier” when the goals are clear

  4. Most of the “skeleton” and “flesh” of the work can already be done in the because of 1-3

First successes


  • First Pull Request (PR) merged on Aug 29 2023 🥳

  • submitted an abstract by November 2023 at PSI

knitr::include_graphics("images/abstract.png")

The phase1b journey in Pull Requests

data <- read.csv("phase1b_12mo_PR.csv", sep = ",")
data <- data[1:20,]
showcase_PR <- data.frame(`Pull request`= data$Pull.request,
                          Order= data$order)
names <- c("Pull Request", "Order of work")
# dimnames(showcase_PR) = list(1:32, names)
dimnames(showcase_PR) = list(1:20, names)
showcase_PR %>%  kbl(align = "c") %>% kable_styling(font_size = 14) %>% kable_classic(lightable_options = "hover", html_font = "\"Source Sans Pro\", helvetica, sans-serif") 
Pull Request Order of work
test for dbetabinom 1
pbetaMix and qbetaMix 2
postprob 3
ocPostprob 4
betadiff 5
postprobDist 6
getBetaMix and dbetaMix 7
predprob 8
Design doc for predprobDist 9
h_predprobdist_single_arm 10
h_predprobdist 11
predprobDist 12
h_get_decisionDist 13
h_getbetaMixpost 14
Design-doc_ocPredprob 15
h_get_decision_one_Predprob 16
h_get_decision_two_Predprob 17
h_get_oc 18
ocPredprob 19
Design-doc_ocPredprobDist 20

The phase1b journey in Commits

data <- read.csv("phase1b_12mo_PR.csv", sep = ",")

# treemap 2
treemap(data,
        index ="order",
        vSize="commits",
        type="index",
        mirror.y = TRUE,
        title = "Commits decrease with later PRs",
        algorithm = "pivotSize",
        fontfamily.title = "helvetica",
        fontfamily.labels = "helvetica",
        border.lwds = 0.5
)

The phase1b journey in Chats

data <- read.csv("phase1b_12mo_PR.csv", sep = ",")

# treemap 2
treemap(data,
        index ="order",
        vSize="chats",
        type="index",
        mirror.y = TRUE,
        title = "Chats decrease with later PRs",
        algorithm = "pivotSize",
        fontfamily.title = "helvetica",
        fontfamily.labels = "helvetica",
        border.lwds = 0.5
)

Features of Statistical Software Engineering

  • Moving parts, collaboration is must
  • Alignment of Forest vs Trees perspective
  • Requires focused and deep thinking
  • Small steps are good starts e.g. small PRs
  • Creative work comes after foundation
  • Systematic deduction is key to debugging
  • Lots of theory and reading literature about new methodology
  • Good engineering practices are habits, needs automation which needs practice which needs time

Good practices for new starter

Goal : introduce good practices and improve muscle memory and collaborate

  • Mindset : Being dependent on others is normal. Asking questions are key because there are also many moving parts. Group chats are valuable in a safe space to ask questions.
  • Regular stand ups
  • Taking Small steps : are good starts
  • Taking Systematic steps : no matter how slow
    • e.g. debug(), undebug(), options(recover = error), options(recover = NULL)
  • older PRs are helpful to learn and reflect

Good practices for the Engineering Lead

Goal : introduce good practices and improve muscle memory and collaborate

  • Patience
  • Quick turnover for feedback
  • Positive feedback but no over positiveness, always based on facts
  • Use chats in PR can be helpful to precise the feedback
  • Demonstrate small and systematic deduction during debugging
  • Acknowledge learning styles can be different
  • Allow developers to do the work, make mistakes and keep sharing rationale

Conditions for success from the phase1b work


  • Positive bias & Trust are win-win situations

  • Learning through mistakes is key

  • Safe space to be ask any questions, and iterate, even for a seasoned engineer

Kinds of feedback


  • “Great motivation to learn new skills”

  • “… it was always fun and easy to work with Audrey”

How I want to be a better Software Engineer

  • Develop the skills of slow and systematic deduction (efficiency gains)
  • Create more conditions for focused, systematic, deep and creative work
  • Perceive small steps as good starts
  • Continue having an open mind on what good practices are
  • Be more patient with the process
  • Fall on the engineering principles (testing, debugging, iteration, reliability, correctness…)

Sharing the Success story: Touring phase1b

  • PSI June 2024 conference in Netherlands link
  • useR!2024 Salzburg link
  • PHUSE September 2024 in Basel link
  • University of Basel November 2024 link
knitr::include_graphics("images/PSI_Stand.jpg", error = FALSE)

Future outlook


  • Go through several more issues to complete and make it more delightful

  • submission to CRAN

  • Collaborate by contacting me

More software work for statisticians with values of : Inclusion, Diversity, Impact and Delightful user experience

Final Reflections

  • I started off as a Clinical Trial Statistician (with good technical skills) but doing something I wasn’t prepared to do paid off
  • There were new skills to gain, but it was always Confidence through Competence
  • Can use this as a reference to build other packages and in other languages e.g. Julia, see my blog post

As in Statistics, no matter how much you’ve improved it’s not permanent

knitr::include_graphics("images/hex6.png")

knitr::include_graphics("images/hex3.png")

knitr::include_graphics("images/hex9.png")

knitr::include_graphics("images/hex4.png")

knitr::include_graphics("images/hex5.png")

Thank you


  • Daniel Sabanés Bové

  • …and other colleagues at Data Science Acceleration at Roche

  • Open Source community and R community that do great work and share their knowledge

I’d love to know how this presentation relates to you or does not !

Some references

  • Thall P F, Simon R (1994), Practical Guidelines for Phase IIB Clinical Trials, Biometrics, 50, 337-349

  • Lee J J, Liu D D (2008), A Predictive probability design for phase II cancer clinical trials, 5(2), 93-106, Clinical Trials

  • Yeo, A T, Sabanés Bové D, Elze M, Pourmohamad T, Zhu J, Lymp J, Teterina A (2024). Phase1b : Calculations for decisions on Phase 1b clinical trials. R package version 1.0.0, https://genentech.github.io/phase1b

  • Code for this presentation

Some more references

  • Code with engineering. Microsoft

  • How to do a code review. Google

  • Pachecho, C, A Technical Journey into API Design-First: Best Practices and Lessons Learned link

  • Why We need to Improve Software Engineering in Biostatistics (October 26 2023 R/Pharma) link

  • Inclusive Speaker Course by Linux Foundation link

  • license


  1. Subject to a change in definition with better tools and practices↩︎